I Classification of incomplete data using the fuzzy ARTMAP neural network

نویسندگان

  • Eric Granger
  • Mark Rubin
  • Mark A. Rubin
  • Pierre Lavoie
چکیده

The fuzzy ARTMAP neural network is used to classify data that is incomplete in one or more ways. These include a limited number of training cases, missing components, missing class labels, and missing classes. Modifications for dealing with such incomplete data are introduced, and performance is assessed on an emitter identification task using a data base of radar pulses. 1 A taxonomy of data incompleteness Data presented to a classifier, during either the training or testing phases, may be incomplete in one or more ways: 1. Limited number of training cases: It is of interest to know how the performance of the classifier declines as the amount of training data is decreased, so that, e.g., more training data may be gathered, if necessary, before the classifier is put to use. 2. Missing components of the input patterns: For example, the information in the different components of the input patterns may come from different sensors, one or more of which may be temporarily unavailable. 3. Missing class labels during training: Some of the training data may have missing class labels. This is referred to as "semi-supervised learning" (Demiriz et al., 1999) or "partially supervised clustering" (Bensaid et al., 1996). ("Missing class labels during testing" is, of course, just the usual situation.) 4. Missing classes: Some classes that were not present in the training set may be encountered during testing. When the classifier encounters a pattern belonging to such an unfamiliar class, it should "flag" the pattern as belonging to an unfamiliar class, rather than making a meaningless guess as to its identity. This may be implemented by using "familiarity discrimination" (Carpenter et al., 1997). (a) Pure familiarity discrimination. As is common practice when evaluating the performance of a classifier, the classifier does not learn during the testing phase. Test patterns which are flagged as unfamiliar are not processed further. In a,ddition to high accuracy of classification of familiar patterns, the quality of the classifier is measured by a high "hit rate"-fraction of familiar-class test patterns correctly declared to belong to classes familiar during testing and classified (correctly or not )-and low "false alarm rate"-fraction of unfamiliar-class test patterns incorrectly declared familiar by the classifier. (b) Learning of unfamiliar classes (LUG). The classifier continues to learn during testing. When an unfamiliar class is flagged, the classifier defines a new class, and the criteria for familiarity discrimination are adjusted as necessary. Subsequent test patterns may be declared by the classifier to be "familiar" and 4Address after March 1, 2000: Sensor Exploitation Group, MIT Lincoln Laboratory, 244 Wood St, Lexington, MA 02420~ classified as belonging either to classes encountered during training or to the "newly-minted" classes; or they may be declared to be "unfamiliar," in which case another new class will be defined. (The normal adjustment of weight values during learning is also allowed during this phase.) The false-alarm rate for an LUC classifier is the fraction of unfamiliar-class (i.e., not encountered during the training phase) test patterns not either flagged as unfamiliar or assigned to a "new" node defined during testing. An additional figure of merit for an LUC classifier is a "purity measure" such as the Rand score (Hubert and Arabie, 1985), which rewards the classifier for assigning test patterns belonging to different unfamiliar classes to different classes defined during testing, while penalizing it for creating too large a number of new classes during testing. In this paper we present methods for dealing with the above types of incomplete data using the fuzzy ARTMAP neural network (Carpenter et al., 1992) for classification. These methods are tested on a radar pulse data set that is described in Section 2. The details of the methods, and the results of their application to the radar pulse data, are described in Sections 3-6. 2 Radar pulse data The data set used consists of approximately 100,000 consecutive radar pulses gathered over 16 seconds during a field trial by the Defense Research Establishment Ottawa. Each of these pulses was produced by one of fifteen different radar types. After the trial, an ESM (electronic support measures) analyst manually separated trains of pulses coming from different emitters. Each pulse j was then assigned a class label OJ E 1,...,15, corresponding to the emitter type from which the analyst determined it to have come.l Since ESM trials are complex and never totally controlled, not all pulses can be tagged and a residue is obtained. Residual pulses were discarded for this study. The input pattern aj corresponding to the jth pulse has three components: aj = (RFj,PWj,PRlj). RF is the radio frequency of the carrier wave, PW is the pulse width (temporal extent of the pulse), and P RI is the pulse repetition interval. The RF and PW components are by their nature associated with each individual pulse, whereas P RI is derived from the time-of-arrival (TOA) of pulses from a single emitter. For simplicity, we assume that, as part of the preprocessing, a TOA deinterleaver (Wiley, 1993) has correctly sorted the Nk pulses belonging to each active emitter type k, k = 1,...,15, and has computed, for each pulse j, PRlj = TOAj -TOAj', where j' is the pulse immediately preceding pulse j in the train of pulses coming from the emitter which produced pulse j. Note that the first pattern from each emitter mode is omitted from the comparison. Also, due to the circular scanning action of some radar emitters, pulses are recorded in bursts. The first pulse of each scan (or burst) is also omitted. Finally, the components of aj were rescaled so that aji E [0,1]. This is required for the application of fuzzy ARTMAP. Once deinterleaved and tagged, the data set used to train and test the classifier contains 52,192 radar pulses from 34 modes, each one belonging to one of the 15 different radar types. The data feature bursts of high pulse densities, multiple emitters of the same type, modes with overlapping parametric ranges, radars transmitting at different pulse rates, and emitters switching modes. The sophistication of the radar types range from simple (constant RF and P RI) to fairly complex (pulse-to-pulse RF and P RI agility). The data also contain direction of arrival (DO A) information, but this is not used here. 3 Limited number of training cases To avoid the problem of node proliferation that can arise when identical or nearly-identical input patterns in the training data correspond to different classes, a fuzzy ARTMAP variant termed MT -(Carpenter and Markuzon, 1998) is employed throughout this paper. After an incorrect prediction during training, the vigilance parameter is raised just enough to induce a search for another internal cluster, then lowered by a small amount to > O. Simulations have indicated (Granger et al. 1999a) that, compared to several other variants of ARTMAP, as well as radial basis function and k-nearest neighbor (kNN) classifiers, this algorithm provides the most effective classification of the present data set in terms of accuracy and computational complexity (compression and convergence time). The radar pulse data set was partitioned into training and test sets. 50% of the data from each radar type was selected at random to form the training set. Then, training set patterns aj were repeatedly 1 A mode number was also assigned to e~ch pulse. A single type of radar can use several modes to perform various functions. We do not here attempt to classify the pulses according to mode, so this label will be ignored. presented, in order of TOA, to each classifier along with their class labels Gj until convergence was reached; that is, when the sum-squared-fractional-change (SSFC) of prototype weights was less than 0.001 for two successive epochs. An epoch is defined as a presentation of the training set to a classifier in a TOA sequence. Finally, the test set (the complete data set less the training data) was presented to the trained classifier for prediction. The results presented are averages over 20 random selections of the data to be used for training. Error bars are standard errors. The kNN classifier is shown for comparison. Fig. l(a) shows the effect on classification accuracy of reducing the amount of training data. Even when only 0.5% of the training data (about 130 pulses) is used, accuracy on the independent test set is 91.4%, compared to 99.6% when all the data is used. The notion that additional training examples beyond a certain point become "redundant" is borne out by Fig. l(b), which shows compression increasing significantly as the number of training patterns is increased. (Compression refers to the ratio of training patterns to hidden layer nodes, a measure of efficiency of information storage.) (a) (b) Figure 1: Limited number of training cases. (a) FUzzy ARTMAP and kNN (k = 1) accuracy. (b) FUzzy ARTMAP compression. 4 Missing components of the input patterns Three strategies for addressing missing input components in fuzzy ARTMAP are: 1. Replacement by "0:" This strategy has been employed as part of the testing phase of Incremental ART (Aguilar and Ross, 1994). Input patterns a are fed to an F1layer that implements partial feature vector complement coding, whi(;h allows for recognition based on a feature vector A'. This vector has the usual complement-coded form, A' = (a,aC), except that both the "on" component ai and the corresponding complement-coded "off" component a~ of A' are set equal to 0 when the ith component of the input pattern a is missing. We extended this approach to include the learning phase (see Table 1). 2. Replacement by "1:" Alternatively, both the on and off components of the complement-coded input pattern A' can be set equal to 1 when the ith component is absent. With this strategy, as IA'I grows, the vigilance test IWj AA'I/IA'1 > P becomes harder to pass. To compensate, the denominator IA'! is replaced by a fixed value M (the same value the complement-coded pattern has A' has in the absence of missing components). 3. Indicator vector: An indicator vector (Little and Rubin, 1987) ~ = (~1,~2,...,~2M) informs the fuzzy ARTMAP network as to the absence or presence of each component of an input pattern: ~i = 1 if component i is present, ~i = 0 if component i is missing, for i = 1,... M, with ~i == ~i-M for i = M + 1, ..., 2M. This strategy, unlike the other two, modifies the weight vector as well as the input vector in response to missing components. Table 1 summarizes the operation of these three strategies (for notational convenience, the indicator vector ~ appears in the learning rules for all 3 strategies). Training was performed with 0.5% of the available training data. A percentage of the components of either the training or test vectors from each emitter type were randomly chosen to be "missing" (although, if a particular choice of missing components would have left the vector with no components, another random choice was made). Results are shown in Fig. 2. It can be seen that the indicator vector method performs better than replacement by "1" and much better than replacement by "0," whether components are missing during testing or training, while providing better compression than either. (a) (b) (c) Figure 2: Missing input pattern components. (a) Accuracy with missing components during testing only. Fuzzy ARTMAP with indicator vector (top curve), replacement by "I" (middle curve), and replacement by "0" (bottom curve). (b) Same as (a), but missing components only during training. (c) Compression. Fuzzy ARTMAP with indicator vector (top curve), replacement by "I" (middle curve), and replacement by "0" (bottom curve). 5 Missing class labels during training To examine the ability of fuzzy ARTMAP to handle training data with missing class labels, the network is trained in two phases. During the first phase, involving supervised learning, the network is trained with a fixed amount of labeled data (0.5% of the available training data). During the second phase, involving unsupervised learning, the network is presented with a variable amount of unlabeled data. Using the fuzzy ART algorithm (Carpenter et al., 1991), with modifications described below, the network associates each unlabeled training input pattern with one of the already-existing internal categories and adjusts the weight vectors associated with that internal category as appropriate. During the supervised-learning phase, the learning rate /3 and baseline vigilance p are kept at their respective default values /3 = 1 (fast learning) and p=O. During the unsupervised-learning phase, smaller values of /3 and larger values of the fixed vigilance parameter p are used, as these have been found to improve the performance on the test set of the final trained classifier. Unlabeled patterns which cannot pass the vigilance test are discarded (i.e., no new internal category nodes are allocated during the unsupervisedlearning phase). Additional improvement in performance is obtained by applying, to any unlabeled pattern which has passed the vigilance test, a coactivation test. An unlabeled pattern is discarded if the activation level TJ of the node with which it is associated is not larger than the activation level TJnext of the nextmost-active node by a sufficiently great amount; i.e., if TJ -TJnext > fco is not satisfied. (A value of fco = 0.05 was used in the simulations.) Fig. 3 shows the effect of retaining training data with missing class labels. Although the approach described above did, with suitable choice of parameters, substantially reduce degradation of the performance of the trained classifier due to the inclusion of unlabeled data, performance significantly better than that achieved by simply discarding all of the unlabeled training data was never observed. Ralio of unlabelled 10 labelled data (a) (b) (c) Figure 3: (a,b) Missing class labels during training. 13 = 0.1 in (a) and p = 0.7 in (b) (see the text). (c) Typical ROC curve for familiarity discrimination without LUC. A is the actual operating point, 0 is the optimal operation point for the curve (minimum value of 1 -hit rate + false alarm rate). 6 Missing classes The modification of fuzzy ARTMAP that deals with unfamiliar classes is ARTMAP-FD. This algorithm has been shown to effectively perform familiarity discrimination on simulated radar range profiles (Carpenter et al., 1997) and radar emitter data (Granger et al., 1999b ). For the simulations, 13 classes were selected out of the 15 emitter type classes, and labeled patterns from these 13 (familiar) classes were presented to the network during the learning phase. The operating threshold was determined during the learning phase using the online method (Carpenter et al., 1997). We first present the results of simulations in which no learning is allowed during the test phase. A hit rate of 99.7% and a false alarm rate of 3.2% were obtained. Accuracy on familiar-class patterns correctly flagged as such was 99.6%. The number of internal category nodes was 111. These results are averages over 20 selections of the 13 familiar classes. The selections were performed at random, with the restriction that selections leading to an insufficient number of unfamiliar-class test patterns (less than a thousand) were not allowed. A typical ROC curve from one of these selections is shown in Fig. 3(c). We next present simulations in which learning continues during testing. The LUG algorithm employed in these simulations is as described in Section 1, with two modifications. To allow us to focus on the effects of LUG (as opposed to learning with missing class labels), the weights associated with internal category nodes allocated during the learning phase are kept at fixed values. In addition, patterns that are declared by the network to be from unfamiliar classes are given a "second chance" to be associated with an existing node before a new unlabeled node is allocated, in order to prevent the generation of an excessively large number of internal category nodes during testing. Specifically, a pattern declared unfamiliar by the network is subjected to a vigilance test at each of the "new "nodes, i.e., nodes that have been created during the test phase. H it passes this vigilance test, it is associated with the node with the highest vigilance value; i.e., that node j out of all the new nodes for which I A 1\ W j I is largest. (In the simulations presented here, the vigilance parameter used was 0.8). No adjustment of the node's weights is performed. H the pattern cannot in this way be associated with an already-existing new node, then a coactivation test is performed between the node J to which the pattern was tentatively assigned prior to having been declared unfamiliar and each of the new nodes jnew, using a small coactivation parameter fco = 0.05. H TJ -Tjnew < fco for any jnew, the pattern is associated with the node jnew for which TJ -Tjnew is smallest. (No weight adjustment takes place.) Only if neither of these options for association with an already-existing new node succeeds is a new new node created. A hit rate of 99.8% and a false alarm rate of 3.3% were obtained with LUG. Accuracy on familiar-class patterns correctly flagged as such was 99.6%. The number of internal category nodes was 117. The Rand score for the new nodes was 0.783. 7 Conclusions and discussion We conclude that, for the present application, fuzzy ARTMAP provides a high level of accuracy and compression even when the amount of training data is limited; that the indicator-vector method of dealing with missing components causes the least degradation in accuracy and compression when input component patterns are missing during training and/or testing; that the use of the vigilance and coactivation tests can prevent performance degradation when training on data with missing class labels (although improvement in performance was not seen on this data set); and that ARTMAP-FD familiarity discrimination can identify patterns belonging to unfamiliar classes during training and testing, and can allow learning of unfamiliar classes to take place during testing. The importance for the application under consideration in this paper of being able to perform familiarity discrimination during the test (operational) phase is evident: radar emitters can exhibit new modes at any time. The ability to perform LUG during training is anticipated also to be of great importance for this application. Preliminary simulations indicate that LUG can improve performance in situations where "pure" FD performs poorly. Now, the task of providing training data for an emitter identification system involves slow, tedious labor by an ESM analyst, so it cannot be expected that more than a small fraction of the large amount of available data will be labeled for training. Furthermore, it cannot be assumed that all of the unlabeled training data comes from emitter classes that have been identified by the ESM analyst. With the modifications presented above, fuzzy ARTMAP should be able to mitigate performance degradation due to missing class labels, while being able to benefit by learning information hidden in the unlabeled data about as-yet unidentified classes. AcknowledgmentsThis researchwas supported in part by the DefenseAdvanced ResearchProjects Agency and the Officeof Naval ResearchONR NOOOI4-95-1-0409(S. G. and M. A. R.), the National Science Foundation NSFIRI-97-20333 (S. G.), the Natural Sciencesand Engineering ResearchCouncil of Canada (E. G.), and theOffice of Naval ResearchONR NOOOI4-95-1-0657(S. G.). ReferencesAguilar, J. M. and Ross,W. D., 1994, "IncrementalART: A NeuralNetwork for Recognitionby IncrementalFea-ture Extraction," In World Conferenceon NeuralNetworks-San Diego: 1994International NeuralNetworkSocietyAnnual Meeting,June 5-9 1994,Vol. I, 593-598.Bensaid,A. M., Hall, L. 0., Bezdek,J. C., and Clarke,L. P., 1996, "Partially SupervisedClusteringfor ImageSeg-mentation," Pattern Recognition,29, 859-871.Carpenter,G. A., Grossberg,S., Markuzon,N., Reynolds,J. H. and Rosen,D. B., 1992, "F\1zzyARTMAP: A Neu-ral Net\vork Architecture for IncrementalSupervisedLearning of Analog MultidimensionalMaps," IEEETrans. on Neural Networks,3:5,698-713.Carpenter,G. A., Grossberg,S. and Rosen,D. B., 1991, "F\1zzyART: FastStableLearningand CategorizationofAnalog Patterns by an Adaptive ResonanceSystem," Neural Networks,4:6, 759-771.Carpenter,G. A. and Markuzon,N., 1998, "ARTMAP-IC and MedicalDiagnosis:InstanceCounting and Incon-sistentCases,"Neural Networks,11, 323-336.Carpenter,G. A, Rubin, M. A. and Streilein,W. W., 1997, "Threshold Determination for ARTMAP-FD Famil-iarity Discrimination." In C. H. Dagli et al., eds.,Intelligent EngineeringSystemsThroughArtificial NeuralNetworks,Volume 7, pp. 23-28.Demiriz, A., Bennett,K. P., and Embrechts,M. J., 1999, "Semi-SupervisedClusteringUsingGeneticAlgorithms,"In C. H. Dagli et al., eds., Intelligent EngineeringSystemsThroughArtificial Neural Networks9, New York,NY: ASME Press,809-814.Granger,E., Grossberg,S., Lavoie,P., and Rubin, M. A., 1999a, "Comparisonof Classifiersfor Radar EmitterType Identification," In C. H. Dagli et al., eds., Intelligent EngineeringSystemsThroughArtificial Neu-ral Networks9, NewYork, NY: ASME Press,3-11.Gr~ger, E., Grossberg,S., Rubin, M. A., and Streilein,W. W., 1999b, "Familiarity Discriminationof RadarPulses," In M. S. Kearns et al., eds., Advancesin Neural Information ProcessingSystems11, Cambridge,MA: MIT Press,875-881.Hubert, L. and Arabie,P. 1985, "ComparingPartitions," Journal of Classification2, 193-218.Little, R.1. A. and Rubin, D. B, 1987, Statistical Analysiswith Missing Data. NewYork: Wiley.Wiley, R. G., 1993, ElectronicIntelligence: The Analysisof Radar Signals,2nd ed., Artech House.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study on the Trend of Range Cover Changes Using Fuzzy ARTMAP Method and GIS

The major aim of processing satellite images is to prepare topical and effectivemaps. The selection of appropriate classification methods plays an important role. Amongvarious methods existing for image classification, artificial neural network method is ofhigh accuracy. In present study, TM images of 1987, and ETM+ images of 2000 and 2006were analyzed using artificial fuzzy ARTMAP neural netwo...

متن کامل

Classification of Incomplete Data Using the Fuzzy ARTMAP Neural Network

The fuzzy ARTMAP neural network is used to classify data that is incomplete in one or more ways. These include a limited number of training cases, missing components, missing class labels, and missing classes. Modifications for dealing with such incomplete data are introduced, and performance is assessed on an emitter identification task using a data base of radar pulses.

متن کامل

A What-and-Where fusion neural network for recognition and tracking of multiple radar emitters

A neural network recognition and tracking system is proposed for classification of radar pulses in autonomous Electronic Support Measure systems. Radar type information is considered with position-specific information from active emitters in a scene. Type-specific parameters of the input pulse stream are fed to a neural network classifier trained on samples of data collected in the field. Meanw...

متن کامل

Evaluation of Fuzzy ARTMAP with DBSCAN in VLSI Application

The various applications of VLSI circuits in highperformance computing, telecommunications, and consumer electronics has been expanding progressively, and at a very hasty pace. This paper describes a new model for partitioning a circuit using DBSCAN and fuzzy ARTMAP neural network. The first step is concerned with feature extraction, where we had make use DBSCAN algorithm. The second step is th...

متن کامل

Classification of Noisy Signals Using Fuzzy ARTMAP Neural Networks

This paper describes an approach to classification of noisy signals using a technique based on the fuzzy ARTMAP neural network (FAMNN). The proposed method is a modification of the testing phase of the fuzzy ARTMAP that exhibits superior generalization performance compared to the generalization performance of the standard fuzzy ARTMAP in the presence of noise. An application to textured gray-sc...

متن کامل

Comparison of ARTMAP Neural Networks for Classification for Face Recognition

In applications of face recognition from video, the What-and-Where fusion neural network has been shown to reduce the generalization error by effectively accumulating a classifier’s predictions over time, according to each individual in the environment. In this paper, fuzzy ARTMAP and ARTMAP-IC are compared for the classification of faces detected in video frames within the What-and-Where fusio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000